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Abstract 

We reinterpret some online greedy algorithms for a class of nonlinear "load-balancing" problems 
as solving a mathematical program online. For example, we consider the problem of assigning jobs to 
(unrelated) machines to minimize the sum of the a tft -powers of the loads plus assignment costs (the 
online Generalized Assignment Problem); or choosing paths to connect terminal pairs to minimize 
the a*k-powers of the edge loads (i.e., online routing with speed- scalable routers). We give analyses 
of these online algorithms using the dual of the primal program as a lower bound for the optimal 
algorithm, much in the spirit of online primal-dual results for linear problems. 

We then observe that a wide class of uni-processor speed scaling problems (with essentially arbi- 
trary scheduling objectives) can be viewed as such load balancing problems with linear assignment 
costs. This connection gives new algorithms for problems that had resisted solutions using the dom- 
inant potential function approaches used in the speed scaling literature, as well as alternate, cleaner 
proofs for other known results. 



'Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Supported in part by NSF 
awards CCF-0964474 and CCF-1016799. R.K. supported in part by an IBM Graduate Fellowship. 

^Computer Science Department, University of Pittsburgh, Pittsburgh, PA 15260, USA. kirk@cs.pitt.edu. Supported 
in part by NSF grant CCF-0830558 and an IBM Faculty Award. 



1 Introduction 



In this paper, we consider two online problems related to load balancing. We call the first problem 
Online Generalized Assignment Problem (OnGAP): 

Definition of OnGAP: Jobs arrive one by one in an online manner, and the algorithm must fractionally 
assign these jobs to one of m machines. When a job j arrives, the online algorithm learns £j e , the amount 
by which the load of machine e would increase for each unit of work of job j that is assigned to machine 
e, and Cj e , the assignment cost incurred for each unit of work of job j that is assigned to machine e. 
The goal is to minimize the sum of the a th powers of the machine loads, plus the total assignment cost. 

The version of OnGAP without assignment costs was studied by |AAF+97| , |AAG+95|| . Our original 
motivation for studying OnGAP is that it models a well-studied class of speed scaling problems with 
sum cost scheduling objectives. In these problems, jobs arrive over time and must be scheduled on a 
speed scalable processor — i.e., a processor that can run at any non- negative speed, and uses power s a 
when run at speed s. The objective is the sum of the energy used by the processor plus a fractional 
scheduling objective that is the sum over jobs of the "scheduling cost" of the individual jobs. These 
speed scaling problems are a special case of OnGAP where the machines model the times that jobs can 
be scheduled, the assignment cost c je models the scheduling cost for scheduling a unit of job j at time 
e. For example, one such scheduling objective is the sum of the fractional flow/response times squared. 
For this objective, Cj e is (e — rj) 2 for all times e that are at least the release time rj of job j, and infinite 
otherwise. Another example is the problem of minimizing energy usage subject to deadline constraints, 
introduced by [YDS95] and considered in followup papers [ BKP07 , BBCP11 , BCPK09]. This problem 
can be viewed as a special case of OnGAP, where each job j has an associated deadline dj, and Cj e is 
zero if e 6 [rj,dj] and infinite otherwise. 

The second problem that we consider is a variation/generalization of OnGAP involving online routing 
with speed scalable routers to minimize energy, which was previously considered in [ AAF + 97, AAZllj. 

Definition of Online Routing with Speed Scalable Routers Problem: A sequence of requests 
arrive one by one over time. Each request j has an associated source-sink pair (sj,tj) in a network of 
speed scalable routers, and the online algorithm must route flow between the source-sink pair, with an 
objective of minimizing the total energy used by the network, where the energy incurred by an edge e 
is the a th power of the load flowing through it. 

For load balancing and online routing, it was known that natural online greedy algorithms, which assign 
jobs to the machine(s) that minimize the increase in cost, can be shown to be O a (l)-competitive via an 
exchange argument, and directly bounding the cost compared to the optimal cost |AAF + 97 , AAG + 95|. 
(In fact, basically the same argument shows that the online greedy algorithm is Oo,(l)-competitive for 
integer assignments.) Once we observe that speed scaling problems with sum scheduling objectives can 
be reduced to OnGAP, it is not too difficult to see that the analysis technique in [ AAG + 95[ can be used 
to show that natural greedy speed scaling algorithms are O p (l)-competitive. 

Our Contribution. In this paper, we first interpret these online problems as solving a mathematical 
program online, where the constraints arrive one-by-one, and in response to the arrival of a new con- 
straint, the online algorithm has to raise some of the primal variables so that the new constraint will be 
satisfied. The online algorithms that we consider raise the primal variables greedily. Our competitive 
analysis will use the dual function of the primal program as a lower bound for optimal. For analysis 
purposes, we assign a value to the dual variable corresponding to a constraint after the online algorithm 
has satisfied that constraint. Our goal is to set the dual variables so that the resulting dual solution 
is closely related to the online solution. (In our analyses, the settings of the dual variables naturally 
correspond to (some approximation for) the increase in the objective function.) We first show how to 
obtain fractional solutions to these problems, and subsequently show how similar ideas can be used for 
integer assignments. 
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Our analyses are very much in the spirit of the online primal dual technique for linear programs [BN07|. 
The main difference is that in the nonlinear setting, the dual is more complicated than in the linear 
setting (where the dual is just another linear program). Indeed, in the nonlinear setting, one can not 
disentangle the objective and the constraints, since the dual itself contains a version of the primal 
objective, and hence copies of the primal variables, within it. Consequently, the arguments for the 
dual function in the nonlinear setting have a somewhat different feel to them than in the linear setting. 
In particular, we need to set the dual variables A, and then find minimizing settings for the copies of 
the primal variables to get a good lower bound. For the load-balancing and speed scaling problems, 
this proceeds relatively naturally. But for the routing problem the dual minimization problem is itself 
non-trivial: in this case we first show how to write a "relaxed/decoupled" dual, which is potentially 
weaker than the original dual, but easier to argue about, and then set the variables of this relaxed dual 
to achieve a good lower bound. We hope this analytical technique will be useful for other problems. 

We then show how a wide class of uni-processor speed-scaling problems (with essentially arbitrary 
scheduling objectives) can be viewed as load balancing problems with linear assignment costs. This 
connection gives new algorithms for speed-scaling problems that had resisted solutions using the dom- 
inant potential function approaches used in the speed scaling literature, as well as alternate, cleaner 
analyses for some known results. For speed scaling problems, our analysis using duality is often cleaner 



(compare for example, the analysis of OA in BKP07 ] to the analysis given here) and more widely 
applicable (for example, to nonlinear scheduling objectives) than the potential function-based analy- 
ses. Furthermore, we believe that much like the online primal-dual approach for linear problems, the 
techniques presented here have potential for wide applicability in the design and analysis of online 
algorithms for other non-linear optimization problems. 



Roadmap: In Section 1.1 we discuss related work. In Section 2] we consider OnGAP. In pection 3 



make some comments about the application of these results to speed scaling problems. In Section 4 we 



we 



consider the online routing problem. In Section 5 we show how to alter the water-filling algorithm to 



obtain integer assignments with a similar competitive ratio, as well a simple randomized rounding with 
a slightly worse performance. 

1.1 Related Work 

An 0(a)-competitive online greedy algorithm for the unrelated machines load-balancing problem in 
the L a -norm was given by |AAF + 97| , AAG + 95| ; Caragiannis [ Car08|| gave better analyses and im- 



provements using randomization. An offline 0(l)-approximation for this problem was given by [AE05] 



and [AKMPS09J, via solving the convex program and then rounding the solution in a correlated fashion. 
For the routing problem, the 0(a) a -algorithm can be inferred from the ideas of |AAF + 97 , AAG + 95|. 
Followup work in a setting of a network consisting of routers with static and dynamic power components 
can be found in jAAZiq , |AAZ11|| . 

There are two main lines of speed scaling research that fit within the framework that we consider here. 
This first is the problem of minimizing energy used with deadline feasibility constraints. [YDS95] pro- 
posed two online algorithms OA and AVR, and showed that AVR is OQ,(l)-competitive by reasoning 



directly about the optimal schedule. [BKP07] introduced the use of potential functions for analyzing 
online scheduling problems, and showed that OA and another algorithm BKP are O a (l)-competitive. 
| BBCP11 | gave a potential function analysis to show that AVR is O a (l)-competitive. |BCPK09 l intro- 
duced the algorithm qOA, and gave a potential function analysis to show that it has a better competitive 
ratio than OA or AVR for smallish a. 

The second main line of speed scaling research is when the scheduling objective is total flow, or more 



generally total weighted flow. [PUW08, AF07] gave offline algorithms for unit-weight unit-work jobs. 
All of the work on online algorithms consider some variation of the "natural" algorithm, which uses 
the "right" job selection algorithm from the literature on scheduling fixed speed processors, and sets 
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the power of the processor equal to the weight of the outstanding jobs. This speed scaling policy is 
"natural" in that it balances the energy and scheduling costs. By reasoning directly about the energy 
optimal schedule, [ AF07[ showed that a batched version of the natural algorithm is O a (l)-competitive 



for unit-work unit-weight jobs. Using a potential function analysis, fBPSOS ] showed that a variation of 



the natural algorithm is Q (l)-competitive for arbitrary- weight arbitrary-work jobs. For the objective 



of total flow plus energy, the bound on the competitive ratio was improved in [LLTW08] by use of 
potential function tailored to integer flow instead of fractional flow. Using a potential function analysis, 
[ BCP09| ] showed a variation on the natural algorithm is 0(l)-competitive for total flow plus energy for 



an arbitrary power function, and a variation on the natural algorithm is scalable, for fractional weighted 
flow plus energy for an arbitrary power function. | ALWlCfl improved the bound on the competitive ratio 



for total flow plus energy. Nonclairvoyant algorithms are analyzed in | CEL + 09| , |CLL10(| . A relatively 



recent survey of the algorithmic power management literature in general, and the speed scaling literature 
in particular, can be found in flMblO| . 



An extensive survey/tutorial on the online primal dual technique for linear problems, as well the history 



of the development of this technique, can be found in [ BN07 1 ■ 



2 The Online Generalized Assignment Problem 

In this section we consider the problem of Online Generalized Assignment Problem (OnGAP). If xj e 
denotes the extent to which job j is assigned on machine e, then this problem can be expressed by the 
following mathematical program: 

min Y ( Y l i eX i e ) + Y Y c i eX i e 

e \ j ' e j 

subject to x je > 1 j = 1) • • • , n 

e 

The dual function of the primal relaxation is then 

g(X) =f^(^2^j+J2{Y l i^ x i^j + Y c i* x iz ~ Y X i x J e ) C 2 - 1 ) 



J e J 3,e J,e 



One can think of the dual problem as having the same instance as the primal, but where jobs are allowed 
to be assigned to extent zero. In the objective, in addition to the same load cost ^ e ( ^ ■ £j e Xj e ) a as in 
the primal, a fixed cost of A,- is paid for each job j, and a payment of Xj — Cj e is obtained for each unit 
of job j assigned. It is well known that each feasible value of the dual function is a lower bound to the 



optimal primal solution; this is weak duality [BV04] 



Online Greedy Algorithm Description: Let 5 be a constant that we will later set to }-i ■ To 

schedule job j, the load is increased on the machines for which the increase the cost will be the least, 
assuming that energy costs are discounted by a factor of 5, until a unit of job j is scheduled. More 
formally, the value of all the primal variables Xj e for all the machines e that minimize 

5-a-U J2^x ie ) +c je (2.2) 

are increased until all the work from job j is scheduled, i.e., ^2 e Xj e = 1. Notice that a ■ £j e ( X^«<j ^ie x ie) a 1 
is the rate at which the load cost is increasing for machine e, and Cj e is the rate that assignment costs 
are increasing for machine e. In other words, our algorithm fractionally assigns the job to the machines 
on which the overall objective function increases at the least rate. Furthermore, observe that if the 
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algorithm begins assigning the job to some machine e, it does not stop raising the primal variable xj e 
until the job is fully assigned^. By this monotonicity property, it is clear that all machines e for which 
Xj e > have the same value of the above derivative when j is fully assigned. Now, for the purpose of 
analysis, we set the value Xj to be the rate of increase of the objective value when we assigned the last 
infinitesimal portion of job j. More formally, if e is any machine on which job j is run, i.e., if Xj e > 0, 
then 

Xj := S ■ a ■ lj e ( ^ ^ie x ie j + c je (2.3) 

Intuitively, Xj is a surrogate for the total increase in objective function value due to our fractional 
assignment of job j (we assign a total of 1 unit of job j, and Xj is set to be the rate at which objective 
value increases). 

We now move on to the analysis of our algorithm. To this end, let x denote the final value of the Xj e 
variables for the online algorithm. 

Algorithm Analysis. To establish that the online algorithm is a a -competitive, note that it is sufficient 
(by weak duality) to show that g(X) is at least ^ times the cost of the online solution. Towards this 
end, let x be the value of the minimizing x variables in g(X), namely 



g ( Y *3 + Y ( Y l ^ X 3^j ~ Y (*3 ~ C 3^j X 3^j 



x = ar 

" xhO V ■ 

3 e " 3 ' 3,e 

Observe that the values x could be very different from the values x, and indeed the next few Lemmas 
try to characterize these values. Lemma 2.1 notes that x only has one job ip(e) on each machine e, 
and Lemma 2.2| shows how to determine (/9(e) and 5w e ) e - Then, in Lemma 2.3, we show that a feasible 
choice for the job <p(e) is the latest arriving job for which the online algorithm scheduled some bit of 
work on machine e; Let us denote this latest job by ip(e). 

Lemma 2.1 There is a minimizing solution x such that ifx~j e > 0, then X{ e = for all i ^ j. 

Proof. Suppose for some machine e, there exist distinct jobs i and k such that Xj e > and x^e > 0. 
Then by the usual argument of either increasing or decreasing these variables along the line that keeps 
their sum constant, we can keep the convex term ■ £j e Xj e ) a term fixed and not increase the linear 

term ^2j(Xj — Cj e ) Xj e . This allows us to either set x~i e or x^ e to zero without increasing the objective. ■ 



Lemma 2.2 Define ip(e) = argmax,- - \ jc ' . Then x w i e \ e = -j— — I — J an( ^ % _ q 

a/(a-l) 



for j / <p(e). Moreover, the contribution of machine e towards g(X) is exactly (1— a) ( c< p( e ) e 



Proof. By Lemma 2.1 we know that in x there is at most one job (say j, if any) run on machine e. 
Then the contribution of this machine to the value of g(X) is 

(ijeXje) ~ (Xj ~ C je )x je (2.4) 

Since x is a minimizer for g(X), we know that the partial derivative of the above term evaluates to zero. 

, /a. N ^ 1 f\ -c \V(«-1) 

This gives a£j e ■ (lj e Xje) a — [Xj — Cj e ) = 0, or equivalently, Xj e = j- f ^.^ J ■ Substituting 



1 It may however increase Xj e and Xj a i at different rates so as to balance the derivatives where e and e' are both machines 
which minimize equation U/A 
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into this value of Xj e into equation ( |2.4| ), the contribution of machine e towards the dual g(X) is 

/<> \a/(a-l) ^ \ l/(a-l) \ a/(a-l) 

[ A J ~ C je \ _ (Aj - Cjej / Aj - Cj e \ , _ / Aj - Cj e \ 

V a^e J ^ J \ ) 

Hence, for each machine e, we want to choose that the job j that minimizes this expression, which is 
also the job j that maximizes the expression (Aj — Cj e )/£j e since a > 1. This is precisely the job (f(e) 
and the proof is hence complete. ■ 

Lemma 2.3 For all machines e, job ift(e) is feasible choice for ip{e). 
Proof. The line of reasoning is the following: 

(/9(e) = arg max = arg max ^6 ■ a ■ ^ E lj e Xie^J ^ = arg max ^ E ^ ieXi ^j ^ = i>( e ) ■ 

The first equality is the definition of <f(e). For the second equality, observe that for any job k, 

Afc < 5 ■ a ■ 4fc(E tieXieT' 1 + Cfce =^ ^ — < S a (^ ^ e Xi e ) Q_1 . 
i<k ^ e i<k 

The expression on the right is monotone increasing in Yli<k^ x i^-> ^ ne l°ad due to jobs up to (and 
including k). Moreover, it is maximized by the last job to assign fractionally to e (since the inequality 
is strict for all other jobs). Since this last job is ijj(e), the last equality follows. ■ 

Theorem 2.4 The online greedy algorithm is a a -competitive. 



Proof. By weak duality it is sufficient to show that g(X) > ON/a a . Applying Lemma 2.2 to the 
expression for g(X) (equation ( |2.1| )) and substituting the contribution of each machine towards the 
dual, we get that 



\ ^ \ a/(a—l) 

ip(e)e 



Now we consider only the first term • Xj and evaluate it. 

E A J = E^' e (2.6) 



J,e 

a-1 



= E X je ( E S ' a ' ( E ^ ieXie ) + C 3 e ) ^ 2-7 ) 
= (5 ■ a) E E t,- e £je ( E ^ e X ie J + E 2j e Cj e (2.8) 
- 5 E ( E e 3^je) + E X ^Cje (2-9) 



Now consider the second term of equation (|2.5| ) . Note that if we substitute the value of \p( e \ , it evaluates 
to (1 — a)5 a ^ a ~^ Yle (Ylj tje%je) a Putting the above two estimates together, we get 



9 (X) > 5 ( E £ ^je) " + E X 3eC 3 e + (1 - a)^"" 1 ) W E ^e) ' 



(2.10) 
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= (s + (1 - a)8<*l^ ) E ( E " + E 

> ON/a a 

The final inequality is due to the choice of 5 = l/a a_1 which maximizes (S + (1 — a)<5°/( a-1 )) . 



(2.11) 
(2.12) 



As observed, e.g., in [AAG_J_95], this 0(a) a result is the best possible, even for the (fractional) OnGAP 



problem without any assignment costs. In Section 5 , we show how to obtain an 0(a) a -competitive 
algorithm for integer assignments by a very similar greedy algorithm, and dual-fitting, albeit with a 
more careful analysis. 

3 Application to Speed Scaling 

We now discuss the application of our results for OnGAP to some well-studied speed scaling problems. 
In these problems a collection of jobs arrive over time. The j th job arrives at time ?%■, and has size/work 
Pj. These jobs must be scheduled on a speed scalable processor that can run at any non-negative 
speed. There is a convex function P(s) = s a specifying the dynamic power used by the processor as 
a function of speed s. The value of a is typically around 3 for CMOS based processors. Commonly, 
one considers objectives Q of the form S + f3£, where S is a scheduling objective, and £, is the energy 
used by the system. Moreover, the scheduling objective S is a fractional sum objective of the form 
Si J2t ir^jt, where Cjt is the cost of completing job a unit of work of job j at time t, and Xjt is 
the amount of work completed at time t, or the corresponding integer sum objective St yji 
where yjt indicates whether or not job j was completed at time t. Fractional scheduling objectives 
are interesting in their own right (for example, in situations where the client gains some benefit from 
the early partial completion of a job), and are often used in an intermediate step in the analysis of 
algorithms for integer scheduling objectives. 

Normally one thinks of the online scheduling algorithm as having two components: a job selection policy 
to determine the job to run, and a speed scaling policy to determine the processor speed. However, one 
gets a different view when one thinks of the online scheduler as solving online the following mathematical 
programming formulation of the problem (which is an instance of the OnGAP problem): 



mm 



E(E x j*) +EE c ^ 

* V 3 J 3 * 

subject to x jt > 1 



Here the variables Xjt specify how much work from job j is run at time t. Because we are initially 
concerned with fractional scheduling objectives, we can assume without loss of generality that all jobs 
have unit length. The arrival of a job j corresponds to the arrival of a constraint specifying that job j 
must be completed. Greedily raising the primal variables corresponds to committing to complete the 
work of job j in the cheapest possible way, given the previous commitments. 

Two Special Cases. A well-studied speed-scaling problem in this class is when the scheduling ob- 



jective is energy minimization subject to deadline feasibility [YDS95, BKP07, BBCP11, BCPK09|: for 
each job j there is a deadline dj, and Cjt = for t € [rj, dj] and is infinite otherwise. Our algorithm for 



OnGAP is essentially equivalent to the algorithm Optimal Available (OA), introduced in [ YDS95 | and 



shown to be a a -competitive in [BKP07] — specifically, the speeds set by both OA and our algorithm are 
the same at all times, but the jobs that are run may be different, since OA uses Earliest Deadline First 
for scheduling. Our analysis of the online greedy algorithm for OnGAP is an alternate, and simpler, 



analysis of OA than the potential function analysis in |BKP07]. In this instance, our duality based 
analysis is tight, as OA is no better than a Q -competitive [YDS95, BKP07]. 
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Another well-studied special case is when the objective is total flow [AF07, BPSOE, BCP09, LLTW08 



ALW1C , |CEL + 09 , |CLL10| 1. That is, Cjt equals (t — rj) for t > Tj and infinite otherwise. All prior 



algorithms for this objective assume some variation of the balancing speed scaling algorithm that sets 
the power equal to the (fractional) number /weight of unfinished jobs. Unlike the earlier example above, 
OnGAP behaves differently than these balancing algorithms for the following reason. When a job arrives, 
OnGAP only focuses on choosing assignments which minimize the rate of increase of the objective, and 
this rule determines both the scheduling policy (in fact, the entire schedule of job j is decided upon 
arrival) and the power usage over time. However, in the balancing algorithms the speed profile (and 
hence power usage) is entirely determined by the scheduling policy, and the scheduling policies used 
are typically the ones optimal for fractional flow like SJF. Hence it is likely that our algorithms will 
actually have a different work profile from balancing algorithms. 

A Note about our Approximation Guarantees. A closer examination of our analysis (especially 
equation ( (2.11|) ) shows that our algorithm has a Lagrangian multiplier preserving property: we get that 
our convex cost + a a times the linear term is at most a a times the dual. This separation between the 
linear and non-linear terms in the objective happens because the constraints are linear, and when we 
compute the dual, the dual variables are involved in the linear terms whereas the convex terms in the 
minimizer are identical to their expressions in the primal. In order to argue about the dual minimizer, 
this somehow forces us to be exact on the linear terms. This can perhaps explain why our analysis 
is tight for the deadline feasibility and load balancing problems [BKP07, AAF + 97 , AAG + 95[| (where 
there are no linear terms), but is an 0(a a )-factor worse than the algorithm of BCPOS] for fractional 
flow+energy (which has non-trivial linear terms). 

Comparison to Previous Techniques. In most potential function-based analyses of speed scaling 
problems in the literature, the potential function is defined to be the future cost for the online algorithm 
to finish the jobs if the remaining sizes of the jobs were the lags of the jobs, which is how far the online 
algorithm is behind on the job [IMP11]. A seemingly necessarily condition to apply this potential 



function technique is that there must be is a relatively simple algebraic expression for the future cost 
for the online scheduling algorithm starting from an arbitrary state. As it is not clear how to obtain such 
an algebraic expression for the most obvious candidate algorithms for nonlinear scheduling objectives, 
this to date has limited the application of this potential function method to speed scaling problems 
with linear scheduling objectives. However, our dual-based analysis for OnGAP yields an online greedy 
speed scaling algorithm that is Q (l)-competitive for any sum scheduling objective. 

Our algorithm for OnGAP has the advantage that, at the release time of a job, it can commit to the 
client exactly the times that each portion of the job will be run. One can certainly imagine situations 
when this information would be useful to the client. Also the OnGAP analysis applies to a wider class 
of machine environments than does the previous potential function based analyses in the literature. For 
example, our analysis of the OnGAP algorithm can handle the case that the processor is unavailable at 
certain times, without any modification to the analysis. Although this generality has the disadvantage 
that it gives sub-optimal bounds for some problems, such as when the scheduling objective is total flow. 

By speeding up the processor by a (1+e) factor, one can obtain an online speed scaling algorithm that one 



can show, using known techniques [BLMSP06, BPS09], has competitive ratios at most min(a Q (l+e) a , - 
for the corresponding integer scheduling objective. 



2 For L a norms of flow on a single fixed speed processor, no potential function is required to prove scalability of natural 
online algorithms [BP10|. For more complicated non-work conser ving machine en vironments, there are analyses that use a 



potential function that is a rough approximation of future costs [G1IK 10, IMlOj. But despite some effort, it has not been 



clear how to extend these potential functions to apply to L a norms of flow in the speed scaling setting. 
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4 Routing with Speed Scalable Routers 



In this section we consider the following online routing problem. Routing requests in a graph/network 
arriving over time. The j th request consists of a source Sj, a sink tj, and a flow requirement /,-. In 
the unsplittable flow version of the problem the online algorithm must route fj units of flow along a 
single (sj, tj)-path. In the splittable flow version of this problem, the online algorithm may partition the 
fj units of flow among a collection of (sj, tj)-paths. In either case, we assume speed scalable network 
elements (routers, or links, or both) that use power £ a when they have load £, where the load is the sum 
of the flows through the element. We consider the objective of minimizing the aggregate power. We show 
that an intuitive online greedy algorithm is a Q -competitive using the dual function of a mathematical 
programming formulation as a lower bound to optimal. 

High Level Idea: The proof will follow the same general approach as for OnGAP: we define dual 
variables Xj for the demand pairs, but now the minimization problem (which is over flow paths, and not 
just job assignments) is not so straight-forward: the different edges on a path p might want to set f{p) 
to different values. So we do something seemingly bad: we relax the dual to decouple the variables, and 
allow each (edge, path) pair to choose its own "flow" value f(p, e). And which of these should we use as 
our surrogate for /(p)? We use a convex combination ^2 e£p h e f(p, e) — where the multipliers h(e) are 
chosen based on the primal loads(\), hence capturing which edges are important and which are not. 

4.1 The Algorithm and Analysis 

We first consider the splittable flow version of the problem. Therefore, we can assume without loss of 
generality that all flow requirements are unit, and all sources and sinks are distinct (so we can associate 
a unique request j (p) with each path p) . This will also allow us to order paths by the order in which flow 
was sent along the paths. We now model the problem by the following primal optimization formulation: 

min £ ( £ £ /(p) 

z j pBeipePj 

subject to f(p) > 1 j = 1, . . . , n 

P ePj 

where Pj is the set of all (sj,tj) paths, and f{p) is a non-negative real variable denoting the amount of 
flow routed on the path p. In this case, the dual function is: 

= ™n (E a . + E(E E f(p)Y- E W) 

One can think of the dual function as a routing problem with the same instance, but without the 
constraints that at least a unit of flow must be routed for each request. In the objective, in addition 
to energy costs, a fixed cost of \j is payed for each request j, and a payment of Xj is received for each 
unit of flow routed from Sj to tj. 

Description of the Online Greedy Algorithm: To route flow for request j, flow is continuously 
routed along the paths that will increase costs the least until enough flow is routed to satisfy the request. 

That is, flow is routed along all (sj,tj) paths p that minimize Yleep a ' (Ylq<p-qBe ■ F° r analysis 

purposes, after the flow for request j is routed, we define 



% = E /(«)) 



a-1 



e€p q<p:qBe 

where p is any path along which flow for request j was routed, and 5 is a constant (later set to J-i )• 



8 



The Analysis: Unfortunately, unlike the previous section for load balancing, it is not so clear how to 
compute the dual function g(X) or its minimizer since the variables cannot be nicely decoupled as we did 
there (per machine). In order to circumvent this difficulty, we consider the following relaxed function 
g(X,h), which does not enforce the constraint that flow must be routed along paths. This enables us 
to decouple variables and then argue about the objective value. Indeed, let f(p) be the final flow on 
path p for the routing produced by the online algorithm. Let h(e) = a E^ p9e f{p) a ~ l be the incremental 
cost of routing additional flow along edge e, and h(p) = ^2 e€p h(e) be the incremental cost of routing 
additional flow along path p. We then define: 

8A*)-m|n(E%+E(E E /m)"-ME^')) 

j e V 3 ple-.pSPj 7 j p6Pj eGp yF 1 7 

Conceptually, f(p, e) can be viewed as the load placed on edge e by request j{p). In g(X, h), the scheduler 
has the option of increasing the load on individual edges e € p € Pj, but the income from edge e will 



be a factor of less than the income achieved in g(X). In [Lemma 4T| we prove that g(A, h) is a lower 



bound for g{\). We then proceed as in the analysis of OnGAP. Lemma 4.2 shows how the minimizer 
and value of (/(A, h) can be computed, and Lemma 4~3| shows how to bound some of the dual variables 
in terms of the final online primal solution. 

Lemma 4.1 For the above setting ofh(-), *g(X,h) < g(X). 

Proof. We show that there is a feasible value of g(X, h) that is less than g(\). Let the value of f(p, e) 
in g(X, h) be the same as the value of f(p) in g(X). Plugging these values for f(p, e) into the expression 
for g(X, h), and simplifying, we get: 



s(U)<E^ + E(E E /(p))°-E*iE/(p)E££ 

j e j p5e:pePj ' j p£Pj egp 



5(A) 



The first equality holds by the definitions of h(e) and h(p), and the second equality holds by the 
optimality of f(p). ■ 

Lemma 4.2 There is a minimizer f ofg(X, h) in which for each edge e, there is a single path p(e) such 

f\- h(e)\ l/{a ~ l) 

that f(p, e) is positive, and f(p(e), e) = ( J 

Proof. The argument that ^(A, h) has a minimizer where each edge only has flow from one request 



follows the same reasoning as in the proof of Lemma 2.1. Once we know only one request sends flow on 



any edge we can use calculus to identify the minimizer and the value which achieves it. Indeed, we get 

remental energy cot 
\a-l _ "\ h(e) 



that p(e) = argmax P 9e ~^^y~ - • and the value of f(p(e),e) is set so that the incremental energy cost 



would just offset the incremental income from routing the flow, that is a f(p(e), e) a = -\/( p ( e )) hMeYj ' 
Solving for f(p(e),e), the result follows. ■ 

Lemma 4.3 Xjt p i e \\ < 5 ■ h(p{e)) 

Proof. Aj( p ( e )) is 5 times the rate at which the energy cost was increasing for the online algorithm when 
it routed the last bit of flow for request j(p(e)). h(p(e)) is the rate at which the energy cost would 
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increase for the online algorithm if additional flow was pushed along p(e) after the last request was 
satisfied. If p(e) was a path on which the online algorithm routed flow, then the result follows from 
the fact the online algorithm never decreases the flow on any edge. If p{e) was not a path on which 
the online algorithm routed flow, then the result follows from the fact that at the time that the online 
algorithm was routing flow for request j(p(e)), p(e) was more costly than the selected paths, and the 
cost can't decrease subsequently due to monotonicity of the flows in the online solution. ■ 

Theorem 4.4 The online greedy algorithm is a a competitive. 

Proof. We will show that <?(A, h) is at least the online cost ON divided by a a , which is sufficient since 
g(A, h) is a lower bound to g(X) by [Lemma 4.1 , and since g(X) is a lower bound to optimal. 



= *(£%+£(£ £ *».«>)-£% EE 



;Ca,-(«-i)£ 



> 



£X,--(a-l) 



E 



a ■ h(p(e)) 
5-h{e)^ a/{a - l) 



a 



j e ^ p3e ' 

E % E ?<p) - (« - i)« a/(a - 1) E ( E zoo) ° 
*«££/&>)(£ E /(^^-("-ir^EfE/w)' 



j p€Pj eGp g<P-q^e e pBe 



> * E ( E / w) " - (« - 1)^ - 11 E ( E f(p)) ' 
= ^£(£/w)" 



> ON/a c 



4.13) 

4.14) 

4.15) 
4.16) 
4.17) 

4.18) 

4.19) 

4.20) 
4.21) 



The equality in line ( 4. 13|) is the definition of g(X,h). The equality in line ( [4.14 ) follows from Lemma 
|4.2| . The inequality in line ( 4.15| ) follows from |Lemma 4.3] . The equality in line (4.16) follows from the 
definition of h(e). The equality in line ( [4.17 ) follows from the feasibility of /. The equality in line ( 4.18] ) 
follows from the definition of A. The equality in line (|4.1S| ) follows from the definition of 5. ■ 

While the above algorithm only gives a splittable routing, i.e., a fractional routing, we note that the 
ideas of the next section, Section 5 , can be used to obtain an O a (l)-competitive algorithm for integer 
flow as well by using a slightly modified primal program (we have to handle non-uniform demands, and 
also strengthen the basic convex program to prevent some trivial integrality gaps. The next section 
described how we can handle these issues for the load balancing problem. 

5 Online Load Balancing: Integral Assignments 

For simplicity, let us consider online integer load balancing without assignment costs; it is easy to see 
the extension to the other problems that we consider. In this problem each job has the values £j e , and 
the goal is to integrally assign it to a single machine so as to minimize the sum ^2 e (^2j Xj e £j e ) a where 
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Xj e is the indicator variable for whether job j is assigned to machine e. The most natural reduction 
to our general model OnGAP is to set all Cj e 's to 0. However, the convex relaxation for this setting 
has a large integrality gap with respect to integral solutions. For example, consider the case of just a 
single job which splits into m equal parts (where m is the number of machines) — the integer primal 
optimal pays a factor of m a ~ l times the fractional primal optimal. To handle this case, we add a fixed 
assignment cost of Cj e = for assigning job j to machine e. It is easy to see that the cost of an optimal 
integral solution at most doubles in this relaxation. This is the convex program that we use for the rest 
of this section. 

5.1 Approach I: Integer Assignment 

In this section, we show that an algorithm which gives an 0(a) a -competitive ratio. Consider the 
following greedy algorithm: when job j arrives, it picks the machine e that minimizes 

i<j 

and set Xj e = 1. Moreover, set \j for job j to be precisely the quantity above. Let Xj e be the final 
settings of the primal variables. 

For the analysis, we again need to set the Xj's. For machine e, again consider <p(e) and ip(e) as defined 
in the previous proofs. We can no longer claim that ip(e) is a feasible setting for <p(e). However, we can 
claim that the load on machine e that is seen by 93(e) (and indeed, by any job k) is at most the load 
seen by tp(e) when it arrived, plus the length i^,t e \ e . Hence 

V)<i«W( E ^e+^(e)e) a " 1 +^ (e)e =► ^±-J£^ < 5 ( £ ffcXfe) - 1 . (5.22) 
i<ij}{e) ° ^ e i<ip{e) 

But since i^(e) is the last job on machine e, this last expression is exactly 5 £ie%ie) a ~ ■ Now recall 
the dual from (2.5). Observing that that a > 1, we can use the calculations we just did to get 

j e ^ i ' 

--ad ZjeXje ( J2 ^) + E %e*3* + t 1 ~ «)^ /(a_1) ( E 
" »' ^ i:i<j ' j,e j,e 

E **) a " + E % + (i - ^ al(a - 1] ( E ^) ° 




e(e(a + 1)) 



a 

-je 



where the last inequality is obtained by applying Lemma 5.1 to the expression for each machine e. This 
implies the 0(a) a competitive ratio for the integral assignment algorithm. 

Lemma 5.1 Given non-negative numbers ao, a±, 02, ■ ■ ■ , ax and 5 = (e(a + 1)) Q_1 , we get 

a— l / \ a -1 / \ a 



je[T] K Kj 7 i€[T] j 6 [T] 7 V V 77 \ jg[T] / 

(5.23) 



11 



Proof. First, consider the case when a > 2: in this case, we bound the LHS of ( |5.23 ) when the 
sequence of numbers is non-decreasing, and then we show the non-decreasing sequence makes this LHS 
the smallest. We then consider the (easier) case of a € [1,2]. 

Suppose a < ai < ■ ■ ■ < a T , then YlJ=o a jC>2i<j a i) a_1 > Hj=o a j(Yli<j a i) a_1 i an d it suffices to 
(lower) bound the following term 



3=0 V r.r j ' j = l \, I J 



There are two cases, depending on the last term: whether ax < \ Ylj=o a i-> or no *- 

• If ax < ^ YlJ=o a i' we Sj=o a j — (1 + 1/a) YlJ=o a j- Now, consider the first term in ( [5.24 ): 

«*£«i(5>) ^ (l + l/aK ^'J -e(^ a V' 

j=0 V i:i<i 7 V i=0 7 j=0 7 V i=0 7 

(The last inequality used the fact that (1 + l/a) a approaches e from below.) Finally, plugging 
this back into ( |5.24D , ignoring the second sum, and using 5 = (e(a + l)) 1_a , we can lower bound 
the expression of fl5.24| ) by (X^=o a i) a times 

1 - ( a - i)s<*/(<*-i) = 1 = (a + l)-(q-l) > 2 

e v ' e(e(a + l)) Q - 1 (e(a + l)) a (ea) Q (l + l/a) a ~ e(ea) a 

• In case ax > \ Ylj=o a i' we S e t Ylj a j = Ylj<T a j + < (1 + a)«T- Now using just the single 
term a T from the first two summations in ( |5.24|) , we can lower bound it by 

a I nm/fa-D/i , \aa a ( \ (a-l)(l+a) Q \ / OL - l\ 

a T — (a - 1)5 1 v ; (1 + a) a r = a T 1 i—. — = a T 1 . 

T V ; V V (e(a + 1)) Q / T V e a / 

This is at least a T /2 > 2(1 _^ a)Q (£V Oj) Q > 2i^- 



So in either case the inequality of the statement of Lemma is satisfied. 

Now to show that the non-decreasing sequence makes the LHS smallest for a > 2. Only the first 
summation depends on the order, so focus on X^/efT] a j(Si<j a i) a " 1 - Suppose > a/c+i, then let 
o-k = I, afc+i = s; moreover, we can scale the numbers so that 2i<fc a « = 1- Now swapping a^ and ak+\ 
causes a decrease of 

(o fc - o fc+ i) • l " 1 + o fc+1 (l + a^" 1 - o fc (l + afc+x)"" 1 = (J - s) + a(l + l)^ 1 - 1(1 + s)^ 1 . 
And for a > 2 this quantity is non-negative. 

Finally, for the case a £ [1,2). Note that a5 < 1 for our choice of 5, so the LHS of (|5.23|) is at least 



ie[T] \ v i<i 7 / V ie[T] 7 



The second inequality used the fact that a' 3 + fo' 3 > (a + fe)' 3 for /3 € (0, 1). Now the proof proceeds as 
usual and gives us the desired 0(ea) a bound. ■ 
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5.2 Approach II: Randomized Rounding 



We now explain a different way of obtaining integer solutions: by rounding (in an online fashion) the 
fractional solutions obtained for the problems that we consider into integral solutions. While this has a 
weaker result, it is a simple strategy that may be useful in some contexts. 

Suppose we have a fractional solution w.r.t the above parameters (after including the assignment 
cost). While it is known that the convex programming formulation we use has an integrality gap 
of 2 [AE05 , AKMPSOS], these proofs use correlated rounding procedures which we currently are not 



able to implement online. Instead we analyze the simple online rounding procedure that independently 
and randomly rounds the fractional assignment. Indeed, suppose we independently assign each job j to 
a machine e with probability x~ e . Denote the integer assignment induced by this random experiment 
by Yj e . Let L e = ^2j£j e Yje denote the random load on machine e after the independent random- 
ized rounding. Note that L e is a sum of non-negative and independent random variables. We can 
now use the following inequality (for bounding higher moments of sums of random variables) due to 



Rosenthal [Ros70, JSZ85| to get that 



E [L a e ] l / a < K a max ( ^ E [£ je Y je ] , ( £ E [£f e Y& 



j 



l/a 



where K a = 0(a/ log a). However we know that J2j E [^jeYje] = Yj^jeXje, and that J2j -^[^je^j 
E[£f e Y je ] = £\ £f e x je . Substituting this back in and using (a + b) a < 2 a ' 1 (a a + b a ), we get 



E[i£] < (K a max ( £ £ je x je , ( £ £%x je ^ ' ^ < T~ X K% [{ ^ £ je x je ) a + ^ c je x je ^ . 

Summing over all e, we infer that E[^ e L"] is at most (2K a ) a times the value of the online frac- 
tional solution objective, and hence at most (2aK a ) a = 0(a 2 / loga) a times the integer optimum by 
[Theorem 2.4| . (Note that the results of the previous section, and those of AAG + 95, Car08] give 
0(a) a -competitive online algorithms for the integer case.) 



6 Conclusion 



The online primal-dual dual technique (surveyed in [BN07]) has proven to be a widely-systematically- 
applicable method to analyze online algorithms for problems expressible by linear programs. This paper 
develops an analogous technique to analyze online algorithms for problems expressible by nonlinear 
programs. The main difference is that in the nonlinear setting one can not disentangle the objective 
and the constraints in the dual, and hence the arguments for the dual have a somewhat different feel 
to them than in the linear setting. We apply this technique to several natural nonlinear covering 
problems, most notably obtaining competitive analysis for greedy algorithms for uniprocessor speed 
scaling problems with essentially arbitrary scheduling objectives that researchers were not previously 
able to analyze using the prevailing potential function based analysis techniques. 

Independently and concurrently with this work, Anand, Garg and Kumar |AGK12|| obtained results that 
are in the same spirit as the results obtained here. Mostly notably, they showed how to use nonlinear- 
duality to analyze a greedy algorithm for a multiprocessor speed-scaling problem involving minimizing 
flow plus energy on unrelated machines. More generally, [ AGK12; 1 showed how duality based analyses 



could be given for several scheduling algorithms that were analyzed in the literature using potential 
functions. 
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